Noisy Data Make the Partial Digest Problem NP-hard

نویسندگان

  • Mark Cieliebak
  • Stephan Eidenbenz
  • Paolo Penna
چکیده

The problem to find the coordinates of n points on a line such that the pairwise distances of the points form a given multi-set of n 2 distances is known as Partial Digest problem, which occurs for instance in DNA physical mapping and de novo sequencing of proteins. Although Partial Digest was – as a combinatorial problem – already proposed in the 1930’s, its computational complexity is still unknown. In an effort to model real-life data, we introduce two optimization variations of Partial Digest that model two different error types that occur in real-life data. First, we study the computational complexity of a minimization version of Partial Digest in which only a subset of all pairwise distances is given and the rest are lacking due to experimental errors. We show that this variation is NP-hard to solve exactly. This result answers an open question posed by Pevzner (2000). We then study a maximization version of Partial Digest where a superset of all pairwise distances is given, with some additional distances due to inaccurate measurements. We show that this maximization version is NP-hard to approximate to within a factor of |D| 2−ε for any ε > 0, where |D| is the number of input distances. This inapproximability result is tight up to low-order terms as we give a trivial approximation algorithm that achieves a matching approximation ratio.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Noisy Data Make the Partial Digest Problem NP - hardTECHNICAL

The Partial Digest problem { well-known for its applications in computational biology and for the intriguingly open status of its computational complexity { asks for the coordinates of n points on a line such that the pairwise distances of the points form a given multi-set of ? n 2 distances. In an eeort to model real-life data, we study the computational complexity of a minimization version of...

متن کامل

Measurement Errors Make the Partial Digest Problem NP-Hard

The Partial Digest problem asks for the coordinates of m points on a line such that the pairwise distances of the points form a given

متن کامل

Modeling of Partial Digest Problem as a Network flows problem

Restriction Site Mapping is one of the interesting tasks in Computational Biology. A DNA strand can be thought of as a string on the letters A, T, C, and G. When a particular restriction enzyme is added to a DNA solution, the DNA is cut at particular restriction sites. The goal of the restriction site mapping is to determine the location of every site for a given enzyme. In partial digest metho...

متن کامل

Partial Digest is hard to solve for erroneous input data

The Partial Digest problem asks for the coordinates of m points on a line such that the pairwise distances of the points form a given multiset of (m 2 ) distances. Partial Digest is a well-studied problem with important applications in physical mapping of DNA molecules. Its computational complexity status is open. Input data for Partial Digest from real-life experiments are always prone to erro...

متن کامل

Double Digest Revisited: Complexity and Approximability in the Presence of Noisy Data

We revisit the double digest problem, which occurs in sequencing of large DNA strings and consists of reconstructing the relative positions of cut sites from two different enzymes: we rst show that double digest is strongly NP-complete, improving previous results that only showed weak NP-completeness. Even the (experimentally more meaningful) variation in which we disallow coincident cut sites ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003